News organisations don’t have a “genuine choice” about whether to block Google from scraping their content for its AI services, one publisher has warned.
Matt Rogerson, director of global public policy and platform strategy at the FT and former Guardian Media Group director of public policy, argued that Google’s “social contract” with publishers – through which it provided value to the industry by sending traffic to their sites – has been broken.
This is because Google now publishes summaries of those publishers’ articles in AI Overviews at the top of many search results – and also sells data generated via its search crawler to third-party large language models (LLMs).
[Read more: Google AI Overviews breaks search giant’s grand bargain with publishers]
In September last year Google introduced Google-Extended, a control which allows website owners to block its AI chatbot Gemini (formerly Bard) and its AI development platform Vertex from scraping their content.
However Google-Extended does not stop sites from being accessed and used in Google’s AI Overviews summaries, meaning that to avoid this publishers would have to opt out of being scraped by Googlebot, which indexes for search.
Rogerson said the presence of Googlebot on “almost the vast majority of websites on the open web enables Google unparalleled access to IP published online” and added that this IP is “now being used to enable Google’s LLMs, and those of third party companies such as Meta, to respond accurately to user queries in real-time”.
In a letter to Baroness Stowell, chair of the House of Lords Communications and Digital Committee, Rogerson said: “This leaves website owners with an unenviable choice.
“To opt-out of the Google Search crawler entirely, and become invisible to the 90%+ of the UK population that currently uses Google Search, or allow scraping to continue in ways that both extract value without compensation, and undermine nascent commercial licensing markets for the use of high quality IP to build and enable the AI models of the future.”
Rogerson’s letter was triggered by Media Minister Stephanie Peacock inaccurately stating in a Future of News inquiry hearing earlier in October that the FT “has an agreement with Google“.
Peacock said licensing approaches like these are “obviously welcome” but there is “no consistency to it. It is quite piecemeal, and there is definitely a question around making it more consistent.”
The FT has not done any deal with Google for the use of its content in LLMs and other AI products, although it has previously been a partner of the tech giant in other projects like the Google News Showcase aggregation service.
The FT has separately signed a licensing agreement with OpenAI and a “number of other agreements for innovative AI related partnerships” including a private beta test with Prorata.ai, a start-up developing technology for generative AI platforms to share revenue with publishers each time their content is used to generate an answer.
Rogerson said of the FT’s OpenAI and Prorata deals: “Both of these agreements begin to align the incentives of AI platforms and publishers in the interests of quality journalism, the reader and respect for IP.
“We strongly believe that sharing revenues between technology companies that use IP and the publishers that create it – can help develop a healthier and fairer information ecosystem that encourages accurate and authoritative journalism and rightly rewards those who produce it.”
He added that this goal of aligning incentives is “being undermined by the scraping practices of incumbent technology companies” including Google.
He said the scraping of publisher IP by Google, which uses it in its own LLMs and sells it to companies like Meta, is still agreed to because sites want to appear in the tech giant’s dominant search engine.
But he said this “means that those companies extract commercial value from the source material, without a user ever engaging with the source of that information.
“From Wikipedia to the Watford Observer, websites rely on engagement with users: engagement that is generated by the content invested in and generated by those sites. Without such engagement the ability to generate any of those revenue streams disappears. This was the social contract of the open web, that value would be shared between search and social gateways and the investors in intellectual property.”
A Google spokesperson said in response: “Every day, Google sends billions of clicks to sites across the web, and we intend for this long-established value exchange with publishers to continue.
“With AI Overviews, people find Search more helpful and they’re coming back to search more, creating new opportunities for content to be discovered. People are using AI Overviews to discover more of the web, and we’re continuing to improve the experience to make that even easier.
“We also provide web publishers with a range of controls to indicate how much of their content is eligible to display in Search.” This includes AI Overviews.
Google claims, although it has not yet shared data on this, that clicks from AI Overviews are higher quality as people are more likely to spend more time on the site.
It says Googlebot is used in AI Overviews because AI has long been built into search and is integral to how it functions.
And it tells publishers that don’t want their content to appear in AI Overviews to use the NOSNIPPET meta tag and the DATA-NOSNIPPET attribute to limit visibility of specific pages or parts of page – similar to how they could previously control whether they appeared as featured snippets at the top of results.
Email pged@pressgazette.co.uk to point out mistakes, provide story tips or send in a letter for publication on our "Letters Page" blog